An Analysis on National Maternal Mortality
2025-08-07
Generalized Linear Mixed Models (GLMMs) are a flexible class of statistical models that combine the features of two powerful tools: Generalized Linear Models (GLMs) and Mixed-Effects Models (Agresti 2015)
Can model non-normal outcome variables, such as binary, count, or proportion data
Incorporate random effects, which account for variation due to grouping or clustering in the data, correlated observations, and overdispersion
Handling hierarchical or grouped data (e.g., students within classrooms, patients within clinics) (Lee and Nelder 1996)
Modeling non-normal outcomes, such as:
Binary outcomes (using logistic GLMMs) (Wang et al. 2017)
Count data (using Poisson or negative binomial GLMMs) (Candy 2000)
Proportions or rates (Salinas Ruı́z et al. 2023)
Let
\(\mathbf{y}\) be a \(Nx1\) column vector outcome variable
\(\mathbf{X}\) be a \(Nxp\) matrix for the \(p\) predictor variables
\(\boldsymbol{\beta}\) be a \(px1\) column vector of the fixed effects coefficients
\(\mathbf{Z}\) is a \(Nxq\) matrix of the \(q\) random effects
\(\mathbf{u}\) is a \(qx1\) vector of random effects, and
\(\boldsymbol{\epsilon}\) is a \(Nx1\) column vector of the residuals
Then the general equation for the model is given by:
\[\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}{u}+\boldsymbol{\epsilon}\]
GLMMs typically include a link function that relates the response variable \(\mathbf{y}\) to a linear predictor, \(\eta\), which excludes the residuals. So then \[\boldsymbol{\eta}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\lambda}\]
The link function is \(g(\cdot)\), where \[g(E(\mathbf{y}))=\boldsymbol{\eta}\] where \(E(\mathbf{y})\) is the expectation of \(\mathbf{y}\). The choice of link function depends on the outcome distribution. For this paper our data demonstrates a Negative Binomial distribution for overdispered count data, so we will use a log link function.
\[g(\cdot)=log_e(\cdot)\]
The response variable and the predictors have a linear relationship within the levels of random effects.
The response variable is assumed to follow a negative binomial distribution, with \(\sigma^2>\mu\).
The residuals and random effects are independent.
The random effects are assumed to be normally distributed, with mean 0 and variance \(\sigma\).
Negative Binomial ideal for count data that is overdispersed (which we suspect as it is population data)
Longitudinal data is not independent so a GLMM is necessary so we can include time as a random effect
Accounts for variation in the model that would not be explained by our fixed effects
Analysis performed with R (R Core Team 2019)
Vital Statistics Rapid Release (VSRR) Provisional Maternal Death Counts and Rates, in the form of a .csv
Published by National Vital Statistics System, a collaboration between the National Center for Health Statistics (NCHS) and state vital record offices
Monthly death counts and death rates by race/ethnicity, age, and overall
Data from January 2019 to December 2024
Data is provisional and updated quarterly; becomes more reliable with more updates
Maternal Deaths between 1 and 9 are suppressed for privacy reasons
“Native Hawaiian or Other Pacific Islander, Non-Hispanic” has 70 NAs for Maternal Mortality, omitting this subgroup entirely
“American Indian or Alaska Native, Non-Hispanic” has 58 NAs for Maternal Mortality Rate, not using rate in our model, omitting will not affect modeling
| Name | Fixed_Effects | Random_Effects | Offset |
|---|---|---|---|
| all_glmmodel_nb | Ethnicity, Age_Group, Dobbs_Era | Year | log(Live_Births) |
| ethnicity_agegroup_glmmodel_nb | Ethnicity, Age_Group | Year | log(Live_Births) |
| allno_glmmodel_nb | Ethnicity, Age_Group, Dobbs_Era | Year | None |
| ethnicity_agegroupno_glmmodel_nb | Ethnicity, Age_Group | Year | None |
Family: nbinom2 ( log )
Formula:
Maternal_Deaths ~ Ethnicity + Age_Group + Dobbs_Era + (1 | Year)
Data: deaths_df3
Offset: log(Live_Births)
AIC BIC logLik -2*log(L) df.resid
4567.5 4614.2 -2272.7 4545.5 507
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
Year (Intercept) 0.03158 0.1777
Number of obs: 518, groups: Year, 6
Dispersion parameter for nbinom2 family (): 148
Conditional model:
Estimate Std. Error
(Intercept) -8.79687 0.07678
EthnicityBlack, Non-Hispanic 1.31563 0.02627
EthnicityWhite, Non-Hispanic 0.28329 0.02604
EthnicityHispanic 0.16204 0.02704
EthnicityAmerican Indian or Alaska Native, Non-Hispanic 1.65241 0.06285
EthnicityUnknown 0.04364 0.02750
Age_Group25-39 years 0.38817 0.01821
Age_Group40 years and over 1.81031 0.02053
Age_GroupUnknown NA NA
Dobbs_EraPost-Dobbs -0.22081 0.02303
z value Pr(>|z|)
(Intercept) -114.57 < 2e-16 ***
EthnicityBlack, Non-Hispanic 50.08 < 2e-16 ***
EthnicityWhite, Non-Hispanic 10.88 < 2e-16 ***
EthnicityHispanic 5.99 2.06e-09 ***
EthnicityAmerican Indian or Alaska Native, Non-Hispanic 26.29 < 2e-16 ***
EthnicityUnknown 1.59 0.113
Age_Group25-39 years 21.31 < 2e-16 ***
Age_Group40 years and over 88.17 < 2e-16 ***
Age_GroupUnknown NA NA
Dobbs_EraPost-Dobbs -9.59 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model selection table
cnd((Int)) dsp((Int)) cnd(Age_Grp)
allno_glmmodel_nb 3.516 + +
all_glmmodel_nb -8.797 + +
ethnicity_agegroupno_glmmodel_nb 3.425 + +
ethnicity_agegroup_glmmodel_nb -8.887 + +
cnd(Dbb_Era) cnd(Eth) cnd(off(log(Liv_Brt)))
allno_glmmodel_nb + +
all_glmmodel_nb + + +
ethnicity_agegroupno_glmmodel_nb +
ethnicity_agegroup_glmmodel_nb + +
offset df logLik AICc delta weight
allno_glmmodel_nb 11 -2269.287 4561.1 0.00 0.969
all_glmmodel_nb l(L_B) 11 -2272.739 4568.0 6.90 0.031
ethnicity_agegroupno_glmmodel_nb 10 -2311.825 4644.1 82.99 0.000
ethnicity_agegroup_glmmodel_nb l(L_B) 10 -2313.794 4648.0 86.93 0.000
Abbreviations:
offset: l(L_B) = 'log(Live_Births)'
Models ranked by AICc(x)
Random terms (all models):
cond(1 | Year)
Our Chosen model in Regression equation format:
\[ \begin{align*} \log(\mathbb{E}[\text{Maternal Deaths}_i]) &= \beta_0 + \beta_1 \cdot \text{Black}_i \\ &+ \beta_2 \cdot \text{White}_i + \beta_3 \cdot \text{Hispanic}_i \\ &+ \beta_4 \cdot \text{American Indian or Alaska Native}_i \\ &+ \beta_5 \cdot \text{EthnicityUnknown}_i \\ &+ \beta_6 \cdot \text{Age 25-39}_i + \beta_7 \cdot \text{Age 40 Plus}_i \\ &+ \beta_8 \cdot \text{Post Dobbs}_i + b_{\text{Year}[i]} + \log(\text{Live Births}_i) \end{align*} \]
mean(Maternal_Deaths) var(Maternal_Deaths)
1 828.8889 30819.62
| Maternal Deaths | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 33.65 | 29.12 – 38.88 | <0.001 |
| Ethnicity [Black, Non-Hispanic] |
8.67 | 8.24 – 9.13 | <0.001 |
| Ethnicity [White, Non-Hispanic] |
11.09 | 10.54 – 11.67 | <0.001 |
| Ethnicity [Hispanic] | 4.81 | 4.56 – 5.07 | <0.001 |
| Ethnicity [American Indian or Alaska Native, Non-Hispanic] |
0.62 | 0.54 – 0.70 | <0.001 |
| Ethnicity [Unknown] | 3.80 | 3.60 – 4.01 | <0.001 |
| Age_Group25-39 years | 4.95 | 4.78 – 5.12 | <0.001 |
| Age Group [40 years and over] |
1.04 | 1.00 – 1.08 | 0.058 |
| Dobbs Era [Post-Dobbs] | 0.80 | 0.77 – 0.84 | <0.001 |
| Random Effects | |||
| σ2 | 0.01 | ||
| τ00 Year | 0.03 | ||
| ICC | 0.73 | ||
| N Year | 6 | ||
| Observations | 518 | ||
| Marginal R2 / Conditional R2 | 0.957 / 0.988 | ||
| Maternal Deaths | |||
|---|---|---|---|
| Predictors | Incidence Rate Ratios | CI | p |
| (Intercept) | 0.00 | 0.00 – 0.00 | <0.001 |
| Ethnicity [Black, Non-Hispanic] |
3.73 | 3.54 – 3.92 | <0.001 |
| Ethnicity [White, Non-Hispanic] |
1.33 | 1.26 – 1.40 | <0.001 |
| Ethnicity [Hispanic] | 1.18 | 1.12 – 1.24 | <0.001 |
| Ethnicity [American Indian or Alaska Native, Non-Hispanic] |
5.22 | 4.61 – 5.90 | <0.001 |
| Ethnicity [Unknown] | 1.04 | 0.99 – 1.10 | 0.113 |
| Age_Group25-39 years | 1.47 | 1.42 – 1.53 | <0.001 |
| Age Group [40 years and over] |
6.11 | 5.87 – 6.36 | <0.001 |
| Dobbs Era [Post-Dobbs] | 0.80 | 0.77 – 0.84 | <0.001 |
| Random Effects | |||
| σ2 | 8.00 | ||
| τ00 Year | 0.03 | ||
| ICC | 0.00 | ||
| N Year | 6 | ||
| Observations | 518 | ||
| Marginal R2 / Conditional R2 | 0.056 / 0.059 | ||
The data used is provisional and updates quarterly with both new and old counts, so further analysis may offer differing results
The data offered counts by age group and ethnicity, but not both (i.e. maternal deaths for black women 40 and over). The inclusion of such data would give a better indication of the relationship between the two subgroups.
Due to the Covid-19 pandemic’s impact on the healthcare system, access to regular healthcare was restricted. This likely had an impact on maternal mortality and may partially account for increased rates from 2020-2023.
More variables in the dataset would offer a better picture of the predictors of maternal mortality, specifically in regards to their relationship with ethnicity. Prenatal health, healthcare access, abortion access, and other prenatal behaviors would be useful.